15 research outputs found
Label Shift Estimators for Non-Ignorable Missing Data
We consider the problem of estimating the mean of a random variable Y subject
to non-ignorable missingness, i.e., where the missingness mechanism depends on
Y . We connect the auxiliary proxy variable framework for non-ignorable
missingness (West and Little, 2013) to the label shift setting (Saerens et al.,
2002). Exploiting this connection, we construct an estimator for non-ignorable
missing data that uses high-dimensional covariates (or proxies) without the
need for a generative model. In synthetic and semi-synthetic experiments, we
study the behavior of the proposed estimator, comparing it to commonly used
ignorable estimators in both well-specified and misspecified settings.
Additionally, we develop a score to assess how consistent the data are with the
label shift assumption. We use our approach to estimate disease prevalence
using a large health survey, comparing ignorable and non-ignorable approaches.
We show that failing to account for non-ignorable missingness can have profound
consequences on conclusions drawn from non-representative samples.Comment: 8 pages, 5 figure
A unifying representation for a class of dependent random measures
We present a general construction for dependent random measures based on
thinning Poisson processes on an augmented space. The framework is not
restricted to dependent versions of a specific nonparametric model, but can be
applied to all models that can be represented using completely random measures.
Several existing dependent random measures can be seen as specific cases of
this framework. Interesting properties of the resulting measures are derived
and the efficacy of the framework is demonstrated by constructing a
covariate-dependent latent feature model and topic model that obtain superior
predictive performance